Goto

Collaborating Authors

 privacy requirement



A Privacy-Preserving Federated Learning Method with Homomorphic Encryption in Omics Data

Negoya, Yusaku, Cui, Feifei, Zhang, Zilong, Pan, Miao, Ohtsuki, Tomoaki, Li, Aohan

arXiv.org Artificial Intelligence

Omics data is widely employed in medical research to identify disease mechanisms and contains highly sensitive personal information. Federated Learning (FL) with Differential Privacy (DP) can ensure the protection of omics data privacy against malicious user attacks. However, FL with the DP method faces an inherent trade-off: stronger privacy protection degrades predictive accuracy due to injected noise. On the other hand, Homomorphic Encryption (HE) allows computations on encrypted data and enables aggregation of encrypted gradients without DP-induced noise can increase the predictive accuracy. However, it may increase the computation cost. To improve the predictive accuracy while considering the computational ability of heterogeneous clients, we propose a Privacy-Preserving Machine Learning (PPML)-Hybrid method by introducing HE. In the proposed PPML-Hybrid method, clients distributed select either HE or DP based on their computational resources, so that HE clients contribute noise-free updates while DP clients reduce computational overhead. Meanwhile, clients with high computational resources clients can flexibly adopt HE or DP according to their privacy needs. Performance evaluation on omics datasets show that our proposed method achieves comparable predictive accuracy while significantly reducing computation time relative to HE-only. Additionally, it outperforms DP-only methods under equivalent or stricter privacy budgets.



Optimal Client Sampling in Federated Learning with Client-Level Heterogeneous Differential Privacy

Xu, Jiahao, Hu, Rui, Kotevska, Olivera

arXiv.org Artificial Intelligence

Federated Learning with client-level differential privacy (DP) provides a promising framework for collaboratively training models while rigorously protecting clients' privacy. However, classic approaches like DP-FedAvg struggle when clients have heterogeneous privacy requirements, as they must uniformly enforce the strictest privacy level across clients, leading to excessive DP noise and significant model utility degradation. Existing methods to improve the model utility in such heterogeneous privacy settings often assume a trusted server and are largely heuristic, resulting in suboptimal performance and lacking strong theoretical underpinnings. In this work, we address these challenges under a practical attack model where both clients and the server are honest-but-curious. We propose GDPFed, which partitions clients into groups based on their privacy budgets and achieves client-level DP within each group to reduce the privacy budget waste and hence improve the model utility. Based on the privacy and convergence analysis of GDPFed, we find that the magnitude of DP noise depends on both model dimensionality and the per-group client sampling ratios. To further improve the performance of GDPFed, we introduce GDPFed$^+$, which integrates model sparsification to eliminate unnecessary noise and optimizes per-group client sampling ratios to minimize convergence error. Extensive empirical evaluations on multiple benchmark datasets demonstrate the effectiveness of GDPFed$^+$, showing substantial performance gains compared with state-of-the-art methods.


P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices

Fan, Wei, Yoon, JinYi, Li, Xiaochang, Shao, Huajie, Ji, Bo

arXiv.org Artificial Intelligence

Abstract--Split Learning (SL) is an emerging privacy-preserving machine learning technique that enables resource constrained edge devices to participate in model training by partitioning a model into client-side and server-side sub-models. While SL reduces computational overhead on edge devices, it encounters significant challenges in heterogeneous environments where devices vary in computing resources, communication capabilities, environmental conditions, and privacy requirements. Although recent studies have explored heterogeneous SL frameworks that optimize split points for devices with varying resource constraints, they often neglect personalized privacy requirements and local model customization under varying environmental conditions. T o address these limitations, we propose P3SL, a Personalized Privacy-Preserving Split Learning framework designed for heterogeneous, resource-constrained edge device systems. The key contributions of this work are twofold. First, we design a personalized sequential split learning pipeline that allows each client to achieve customized privacy protection and maintain personalized local models tailored to their computational resources, environmental conditions, and privacy needs. Second, we adopt a bi-level optimization technique that empowers clients to determine their own optimal personalized split points without sharing private sensitive information (i.e., computational resources, environmental conditions, privacy requirements) with the server. We implement and evaluate P3SL on a testbed consisting of 7 devices including 4 Jetson Nano P3450 devices, 2 Raspberry Pis, and 1 laptop, using diverse model architectures and datasets under varying environmental conditions. Experimental results demonstrate that P3SL significantly mitigates privacy leakage risks, reduces system energy consumption by up to 59.12%, and consistently retains high accuracy compared to the state-of-the-art heterogeneous SL system. T o protect data privacy, some research has proposed training entire machine learning models to process data locally [5]. However, training entire ML models on resource-constrained edge devices presents significant challenges, including high energy consumption and prolonged training durations.


Generating Privacy Stories From Software Documentation

Baldwin, Wilder, Chintakuntla, Shashank, Parajuli, Shreyah, Pourghasemi, Ali, Shanz, Ryan, Ghanavati, Sepideh

arXiv.org Artificial Intelligence

--Research shows that analysts and developers consider privacy as a security concept or as an afterthought, which may lead to non-compliance and violation of users' privacy. Most current approaches, however, focus on extracting legal requirements from the regulations and evaluating the compliance of software and processes with them. In this paper, we develop a novel approach based on chain-of-thought prompting (CoT), in-context-learning (ICL), and Large Language Models (LLMs) to extract privacy behaviors from various software documents prior to and during software development, and then generate privacy requirements in the format of user stories. Our results show that most commonly used LLMs, such as GPT -4o and Llama 3, can identify privacy behaviors and generate privacy user stories with F1 scores exceeding 0.8. We also show that the performance of these models could be improved through parameter-tuning. Our findings provide insight into using and optimizing LLMs for generating privacy requirements given software documents created prior to or throughout the software development lifecycle. Understanding the privacy behaviors of software applications and eliciting privacy requirements during the early phases of the software development lifecycle (SDLC) are essential for developing privacy-preserving and regulatory-compliant software [1], [2]. Past research, however, shows that software analysts and developers often consider privacy as a subset of security requirements or as an afterthought [3], [4], and they often lack the tools needed to understand and identify privacy behaviors of the applications they develop [5], [6]. Most common approaches for identifying and eliciting privacy requirements include conducting privacy impact assessments [7], [8], or employing goal-oriented methodologies to map privacy requirements to system processes [8]-[10]. Other works aim to extract privacy-related information from user stories or use case models [11]-[17] by leveraging Natural Language Processing (NLP) techniques and then using predefined templates to generate privacy requirements. However, these approaches mostly focus on the specific forms of software documentation (i.e., user stories or use cases), or they rely on developers to understand how personal information is handled by their applications.


Optimizing QoE-Privacy Tradeoff for Proactive VR Streaming

Wei, Xing, Han, Shengqian, Yang, Chenyang, Sun, Chengjian

arXiv.org Artificial Intelligence

Proactive virtual reality (VR) streaming requires users to upload viewpoint-related information, raising significant privacy concerns. Existing strategies preserve privacy by introducing errors to viewpoints, which, however, compromises the quality of experience (QoE) of users. In this paper, we first delve into the analysis of the viewpoint leakage probability achieved by existing privacy-preserving approaches. We determine the optimal distribution of viewpoint errors that minimizes the viewpoint leakage probability. Our analyses show that existing approaches cannot fully eliminate viewpoint leakage. Then, we propose a novel privacy-preserving approach that introduces noise to uploaded viewpoint prediction errors, which can ensure zero viewpoint leakage probability. Given the proposed approach, the tradeoff between privacy preservation and QoE is optimized to minimize the QoE loss while satisfying the privacy requirement. Simulation results validate our analysis results and demonstrate that the proposed approach offers a promising solution for balancing privacy and QoE.


Federated Learning With Individualized Privacy Through Client Sampling

Lange, Lucas, Borchardt, Ole, Rahm, Erhard

arXiv.org Artificial Intelligence

With growing concerns about user data collection, individualized privacy has emerged as a promising solution to balance protection and utility by accounting for diverse user privacy preferences. Instead of enforcing a uniform level of anonymization for all users, this approach allows individuals to choose privacy settings that align with their comfort levels. Building on this idea, we propose an adapted method for enabling Individualized Differential Privacy (IDP) in Federated Learning (FL) by handling clients according to their personal privacy preferences. By extending the SAMPLE algorithm from centralized settings to FL, we calculate client-specific sampling rates based on their heterogeneous privacy budgets and integrate them into a modified IDP-FedAvg algorithm. We test this method under realistic privacy distributions and multiple datasets. The experimental results demonstrate that our approach achieves clear improvements over uniform DP baselines, reducing the trade-off between privacy and utility. Compared to the alternative SCALE method in related work, which assigns differing noise scales to clients, our method performs notably better. However, challenges remain for complex tasks with non-i.i.d. data, primarily stemming from the constraints of the decentralized setting.


Privacy in Metalearning and Multitask Learning: Modeling and Separations

Aliakbarpour, Maryam, Bairaktari, Konstantina, Smith, Adam, Swanberg, Marika, Ullman, Jonathan

arXiv.org Artificial Intelligence

Model personalization allows a set of individuals, each facing a different learning task, to train models that are more accurate for each person than those they could develop individually. For example, consider a set of people, each of whom holds a relatively small dataset of photographs labeled with the names of their loved ones that appear in each picture. Each person would like to build a classifier that labels future pictures with the names of people in the picture, but training such an image classifier would take more data than any individual person has. Even though the tasks they want to carry out are different--their photos have different subjects--those tasks share a lot of common structure. By pooling their data, a large group of people could learn the shared components of a good set of classifiers. Each individual could then train the subject-specific components on their own, requiring only a few examples for each subject. Other applications of personalization include next-word prediction on a mobile keyboard, speech recognition, and recommendation systems. The goals of personalization are captured in a variety of formal frameworks, such as multitask learning and metalearning.


Personalized Differential Privacy for Ridge Regression

Acharya, Krishna, Boenisch, Franziska, Naidu, Rakshit, Ziani, Juba

arXiv.org Artificial Intelligence

The increased application of machine learning (ML) in sensitive domains requires protecting the training data through privacy frameworks, such as differential privacy (DP). DP requires to specify a uniform privacy level $\varepsilon$ that expresses the maximum privacy loss that each data point in the entire dataset is willing to tolerate. Yet, in practice, different data points often have different privacy requirements. Having to set one uniform privacy level is usually too restrictive, often forcing a learner to guarantee the stringent privacy requirement, at a large cost to accuracy. To overcome this limitation, we introduce our novel Personalized-DP Output Perturbation method (PDP-OP) that enables to train Ridge regression models with individual per data point privacy levels. We provide rigorous privacy proofs for our PDP-OP as well as accuracy guarantees for the resulting model. This work is the first to provide such theoretical accuracy guarantees when it comes to personalized DP in machine learning, whereas previous work only provided empirical evaluations. We empirically evaluate PDP-OP on synthetic and real datasets and with diverse privacy distributions. We show that by enabling each data point to specify their own privacy requirement, we can significantly improve the privacy-accuracy trade-offs in DP. We also show that PDP-OP outperforms the personalized privacy techniques of Jorgensen et al. (2015).